OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES

نویسندگان

چکیده

Text classification is the process of grouping documents based on similarity in categories. Some obstacles doing text are many words appeared text, and some come up with infrequent frequency (sparse words). The way to solve this problem conduct feature selection process. There several filter-based methods; Chi-Square, Information Gain, Genetic Algorithm, Particle Swarm Optimization (PSO). Aghdam's research shows that PSO best among those methods. This study examined optimize k-Nearest Neighbour (k-NN) algorithm's performance categorizing news articles. k-NN an algorithm simple easy implement. If we use appropriate features, then will be a reliable algorithm. used select keywords (term features), it continued classifying using k-NN. testing consists three stages. stages tuning parameter k-NN, PSO, measuring performance. aims determine number neighbours particles. Otherwise, compares without PSO. optimal 9, particles 50. showed 50% reduction terms. results 20 per cent better accuracy than Although PSO's did not always find conditions, method can produce accuracy. In way, work articles, especially Indonesian language articles

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

k-Nearest Neighbour Classifiers

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance today because issues of poor run-time performance is not such...

متن کامل

Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction

Suppose a bank has a database of people’s details and their credit rating. These details would probably be the person’s financial characteristics such as how much they earn, whether they own or rent a house, and so on, and would be used to calculate the person’s credit rating. However, the process for calculating the credit rating from the person’s details is quite expensive, so the bank would ...

متن کامل

Convergence of random k-nearest-neighbour imputation

Random k-nearest-neighbour (RKNN) imputation is an established algorithm for filling in missing values in data sets. Assume that data are missing in a random way, so that missingness is independent of unobserved values (MAR), and assume there is a minimum positive probability of a response vector being complete. Then RKNN, with k equal to the square root of the sample size, asymptotically produ...

متن کامل

CONNECTIVITY OF RANDOM k-NEAREST-NEIGHBOUR GRAPHS

LetP be a Poisson process of intensity one in a squareSn of arean. We construct a random geometric graph Gn,k by joining each point of P to its k ≡ k(n) nearest neighbours. Recently, Xue and Kumar proved that if k ≤ 0.074 log n then the probability that Gn,k is connected tends to 0 as n → ∞ while, if k ≥ 5.1774 log n, then the probability that Gn,k is connected tends to 1 as n → ∞. They conject...

متن کامل

Small components in k-nearest neighbour graphs

Let G = Gn,k denote the graph formed by placing points in a square of area n according to a Poisson process of density 1 and joining each point to its k nearest neighbours. In [2] Balister, Bollobás, Sarkar and Walters proved that if k < 0.3043 logn then the probability that G is connected tends to 0, whereas if k > 0.5139 logn then the probability that G is connected tends to 1. We prove that,...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Asia-Pacific Journal of Information Technology and Multimedia

سال: 2021

ISSN: ['2289-2192']

DOI: https://doi.org/10.17576/apjitm-2021-1001-04